Fujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track
نویسندگان
چکیده
This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utlization, and reranking by bi-gram extraction from pilot search. The e ect of blind application with those techiniques is rather limited, or even uncertain in the TREC8 experiment. What we can say from TREC8 result is that blind application of co-occurence boosting and area weighting may be e ective for the small web track. They requerie query dependent application. In the large web track, our main interest is efciency, that is how much resources are required to process 100GB of web text and 10000 real web queries in practical time. Using a statistical based language type checker, we can eliminate 23% of nonEnglish text. This leads to speeding up a indexing and reducing the index size. The search speed for an inverted le is CPU intensive if the target machine has main memory in excess of 10-25% of the index size. So with simple, but e ective index compression methods, the throughput of query processing is about 0.54-1.1 query/second even by a single 300MHz Ultra-sparc processor. 1 System Description
منابع مشابه
Fujitsu Laboratories Trec8 Report 1 System Description 1.0.1 Tera 2 Common Processing
This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utlization,...
متن کاملPLIERS at TREC8
The use of the PLIERS text retrieval system in TREC8 experiments is described. The tracks entered for are: Ad-Hoc, Filtering (Batch and Routing) and the Web Track (Large only). We describe both retrieval efficiency and effectiveness results for all these tracks. We also describe some preliminary experiments with BM_25 tuning constant variation.
متن کاملAn Early DiscoWeb Prototype at TREC8
Recently the notion of popularity and its generalizations have been investigated as a possible alternative approach to text only analysis to rank web pages in search engines (e.g. [Kle98, BP98, CDR98, CDDG98, BH98, HHMN99] among others). We have built a research prototype that incorporates many link analysis algorithms from the literature and also new algorithms to investigate the impact of the...
متن کاملFujitsu Laboratories Trec9 Report 1 System Description 2 Common Processing 2.1 Indexing/query Processing 2.1.1 Indexing Vocabulary 2.1.2 Stemmer 2.1.4 Stop Word List for Query Processing
This year a Fujitsu Laboratory team participated in web tracks. For TREC9 we experimented passage retrieval which is expected to be e ective for Web pages which contain more than one topic. To split document into passages, we used NLP based paragrah detecting program, not by xed (variable) window size. But it did not produce better result for TREC9 Web data. For indexing large web data faster, ...
متن کاملFujitsu Laboratories TREC2001 Report
This year a Fujitsu Laboratory team participated in web tracks. Both for ad hoc task, and entry point search task, we combined the score of normal ranking search and that of page ranking techniques. For ad hoc style task, the eect of page ranking was very limitted. We only got very little improvement for title eld search, and the page rank was not eective for description, and narrative eld sear...
متن کامل